49 research outputs found

    Paradigm change and language contact: A framework of analysis and some speculation about the underlying cognitive processes

    Get PDF
    This article contains some thoughts on the role of bilingual cognition in the diachronic change of morphological paradigms, with a focus on contact-induced change. In a first step, a general typology of paradigm change is proposed, based on a distinction between three levels of linguistic organization (the sign/Level 1, the category/Level 2, and the dimension/Level 3), and two types of change (neutralization and differentiation), thus distinguishing six types of paradigm change. Examples of these types (taken from the pertinent literature) are discussed, and two questions are addressed in each case: (i) To what extent does contact-induced paradigm change of a specific type differ from internal change? (ii) What are (potentially) the underlying cognitive processes motivating each type of change? The hypothesis is explored that there is a correlation between the three levels of analysis and three types of cognitive processes involved in paradigm change. It is suggested that change at Level 1 is typically based on analogy, change at Level 2 is often sensitive to frequency of use, and change at Level 3 may imply conceptual transfer, as discussed in recent work on weak relativity effects in the context of bilingual cognition

    Towards a corpus-based analysis of evaluative scales associated with even

    Get PDF
    Scalar focus operators like even, only, etc. interact with scales, i. e., ordered sets of alternatives that are referenced by focus structure. The scaling dimensions interacting with focus operators have been argued to be semantic (e. g. entailment relations, probability) in earlier work, but it has been shown that purely semantic analyses are too restrictive, and that the specific scale that a given operator interacts with is often pragmatic, in the sense of being a function of the context. If that is true, the question arises what exactly determines the (types of) scales interacting with focus operators. The present study addresses this question by investigating the distributional behaviour of the additive scalar particle even relative to scales whose focus alternatives are ordered in terms of evaluative attitudes (positive, negative). Our hypothesis is that such evaluative attitudinal scales are at least partially functions of the lexical material in the sentential environment. This hypothesis is tested by determining correlations between sentence-level attitudes and lexically encoded attitudes in the relevant sentences. We use data from the Europarl corpus, a corpus of scripted and highly elaborated political speech, which is rich in argumentative discourse and thus lends itself to the study of attitudes in context. Our results show that there are in fact significant correlations between (manual) sentence-level evaluations and lexical evaluations (determined through machine learning) in the textual environment of the relevant operators. We conclude with an outlook on possible extensions of the method applied in the present study by identifying attitudinal patterns beyond the sentence, showing that positively and negatively connotated instances of even differ in terms of their argumentative function, with positive even often marking the climax and endpoint of an argument, while negative even often occurs in qualifying insertions like concessive parentheses. While we regard our results as valid, some refinements and extensions of the method are pointed out as necessary steps towards the establishment of an empirical sentence semantics, in the domain of scalar additive operators as well as more generally speaking

    Nouns, Verbs and Other Parts of Speech in Translation and Interpreting: Evidence from English Speeches Made in the European Parliament and Their German Translations and Interpretations

    Get PDF
    This study investigates the distributions of word classes in English speeches made in the European Parliament and their German (written) translations and simultaneous interpretations. For comparison, a sample of original German speeches and a selection of political interviews are used. The study is motivated by the intention to understand the relationship between the type of mediation and communicative modes: mediated spoken language is compared to unmediated spoken language and to mediated written language. The results show that the interpretations exhibit a less nominal style than the translations, in this respect resembling unplanned spoken conversation. Other quantitative findings, such as a high frequency of adverbs, also point to a register effect, but interpretations have a hybrid status and can be located somewhere in the middle, between the register of the source text (parliamentary speech) and unplanned spoken discourse. The results are discussed against the background of the mechanisms that presumably underlie the choices made by translators (processing, register and strategies)

    Atomic: an open-source software platform for multi-level corpus annotation

    Get PDF
    This paper presents Atomic, an open-source platform-independent desktop application for multi-level corpus annotation. Atomic aims at providing the linguistic community with a user-friendly annotation tool and sustainable platform through its focus on extensibility, a generic data model, and compatibility with existing linguistic formats. It is implemented on top of the Eclipse Rich Client Platform, a pluggable Java-based framework for creating client applications. Atomic - as a set of plug-ins for this framework - integrates with the platform and allows other researchers to develop and integrate further extensions to the software as needed. The generic graph-based meta model Salt serves as Atomic’s domain model and allows for unlimited annotation levels and types. Salt is also used as an intermediate model in the Pepper framework for conversion of linguistic data, which is fully integrated into Atomic, making the latter compatible with a wide range of linguistic formats. Atomic provides tools for both less experienced and expert annotators: graphical, mouse-driven editors and a command-line data manipulation language for rapid annotation

    Approximate Entropy in Canonical and Non-Canonical Fiction

    Get PDF
    : Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction (‘classics’) and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of local frequencies. We conclude that canonical fictional texts exhibit a higher degree of (sequential) unpredictability compared with non-canonical texts, corresponding to the popular assumption that they are more ‘demanding’ and ‘richer’. In using Approximate Entropy, we propose a new method for text classification in the context of computational textual aesthetics

    Fractality and variability in canonical and non-canonical English fiction and in non-fictional texts

    Get PDF
    This study investigates global properties of three categories of English text: canonical fiction, non-canonical fiction, and non-fictional texts. The central hypothesis of the study is that there are systematic differences with respect to structural design features between canonical and non-canonical fiction, and between fictional and non-fictional texts. To investigate these differences, we compiled a corpus containing texts of the three categories of interest, the Jena Corpus of Expository and Fictional Prose (JEFP Corpus). Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity, and (iv) the distribution of topic probabilities in segments of texts. These basic observations are grouped into two more general categories, (a) the lower-level properties (i) and (ii), which are observed at the level of the sentence (reflecting linguistic decoding), and (b) the higher-level properties (iii) and (iv), which are observed at the textual level (reflecting comprehension/integration). The observations for each property are transformed into series, which are analyzed in terms of variance and subjected to Multi-Fractal Detrended Fluctuation Analysis (MFDFA), giving rise to three statistics: (i) the degree of fractality ( H ), (ii) the degree of multifractality ( D ), i.e., the width of the fractal spectrum, and (iii) the degree of asymmetry ( A ) of the fractal spectrum. The statistics thus obtained are compared individually across text categories and jointly fed into a classification model (Support Vector Machine). Our results show that there are in fact differences between the three text categories of interest. In general, lower-level text properties are better discriminators than higher-level text properties. Canonical fictional texts differ from non-canonical ones primarily in terms of variability in lower-level text properties. Fractality seems to be a universal feature of text, slightly more pronounced in non-fictional than in fictional texts. On the basis of our results obtained on the basis of corpus data we point out some avenues for future research leading toward a more comprehensive analysis of textual aesthetics, e.g., using experimental methodologies

    A Register-Based Study of Interior Monologue in James Joyce’s Ulysses

    Get PDF
    While fictional orality (spoken language in fictional texts) has received some attention in the context of quantitative register studies at the interface of linguistics and literature, only a few attempts have been made so far to apply the quantitative methods of register studies to interior monologues (and other forms of inner speech or thought representation). This article presents a case study of the three main characters of James Joyce’s Ulysses whose thoughts are presented extensively in the novel, i.e., Leopold and Molly Bloom and Stephen Dedalus. Making use of quantitative, corpus-based methods, the thoughts of these characters are compared to fictional direct speech and (literary and non-literary) reference texts. We show that the interior monologues of Ulysses span a range of non-narrative registers with varying degrees of informational density and involvement. The thoughts of one character, Leopold Bloom, differ substantially from that character’s speech. The relative heterogeneity across characters is taken as an indication that interior monologue is used as a means of perspective taking and implicit characterization

    CIS: A Web-Based Course Information System

    Get PDF
    This report surveys the design and implementation of CIS, a web-based Course Information System. CIS has been developed for the Computer Science I/II courses held between 2000 and 2003 by Prof. Dr. R. Loos, which were attended by 300 to 450 students. It maintains and presents each student's submissions and grades and holds related information such as worksheet texts, submission deadlines and the assignment of students to teaching assistents. In short, it covers most of the administrative data that comes up in regular university courses. CIS is designed to be used by first-year students conveniently. It aims at modelling real-world procedures, so that the system behaviour can be explained in well-known analogies. It is minimalistic, in the sense that it only takes on the routine work, while leaving the teacher free in any questions of structuring the contents of the course. Our problem statement and analysis focuses to two aspects: The requirements on the central data base and the interfaces for three groups of users: Students, teaching assistants, and teachers/adminis-trators. The actual implementation is straightforward, and we only mention particular decisions taken herein. CIS has been in use at the Wilhelm-Schickard Institut for three years, in courses organized both by the authors and others. The experiences indicate that the system can be considered reliable and mature. As the effort of setting up CIS is small, it has become feasible to employ it for several advanced courses with fewer than 20 students
    corecore